{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Structure Learning in Bayesian Networks"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this notebook, we show examples for using the Structure Learning Algorithms in pgmpy. Currently, pgmpy has implementation of 3 main algorithms:\n",
    "1. PC with stable and parallel variants.\n",
    "2. Hill-Climb Search\n",
    "3. Exhaustive Search\n",
    "\n",
    "For PC the following conditional independence test can be used:\n",
    "1. Chi-Square test (https://en.wikipedia.org/wiki/Chi-squared_test)\n",
    "2. Pearsonr (https://en.wikipedia.org/wiki/Partial_correlation#Using_linear_regression)\n",
    "3. G-squared (https://en.wikipedia.org/wiki/G-test)\n",
    "4. Log-likelihood (https://en.wikipedia.org/wiki/G-test)\n",
    "5. Freeman-Tuckey (Read, Campbell B. \"Freeman—Tukey chi-squared goodness-of-fit statistics.\" Statistics & probability letters 18.4 (1993): 271-278.)\n",
    "6. Modified Log-likelihood\n",
    "7. Neymann (https://en.wikipedia.org/wiki/Neyman%E2%80%93Pearson_lemma)\n",
    "8. Cressie Read (Cressie, Noel, and Timothy RC Read. \"Multinomial goodness‐of‐fit tests.\" Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464)\n",
    "9. Power Divergence (Cressie, Noel, and Timothy RC Read. \"Multinomial goodness‐of‐fit tests.\" Journal of the Royal Statistical Society: Series B (Methodological) 46.3 (1984): 440-464.)\n",
    "\n",
    "For Hill-Climb and Exhausitive Search the following scoring methods can be used:\n",
    "1. K2 Score\n",
    "2. BDeu Score\n",
    "3. Bic Score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate some data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from itertools import combinations\n",
    "\n",
    "import networkx as nx\n",
    "from sklearn.metrics import f1_score\n",
    "\n",
    "from pgmpy.estimators import PC, HillClimbSearch, ExhaustiveSearch\n",
    "from pgmpy.estimators import K2Score\n",
    "from pgmpy.utils import get_example_model\n",
    "from pgmpy.sampling import BayesianModelSampling"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Generating for node: CVP: 100%|██████████| 37/37 [00:00<00:00, 544.13it/s]\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>HISTORY</th>\n",
       "      <th>CVP</th>\n",
       "      <th>PCWP</th>\n",
       "      <th>HYPOVOLEMIA</th>\n",
       "      <th>LVEDVOLUME</th>\n",
       "      <th>LVFAILURE</th>\n",
       "      <th>STROKEVOLUME</th>\n",
       "      <th>ERRLOWOUTPUT</th>\n",
       "      <th>HRBP</th>\n",
       "      <th>HREKG</th>\n",
       "      <th>...</th>\n",
       "      <th>MINVOLSET</th>\n",
       "      <th>VENTMACH</th>\n",
       "      <th>VENTTUBE</th>\n",
       "      <th>VENTLUNG</th>\n",
       "      <th>VENTALV</th>\n",
       "      <th>ARTCO2</th>\n",
       "      <th>CATECHOL</th>\n",
       "      <th>HR</th>\n",
       "      <th>CO</th>\n",
       "      <th>BP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TRUE</td>\n",
       "      <td>LOW</td>\n",
       "      <td>LOW</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>LOW</td>\n",
       "      <td>TRUE</td>\n",
       "      <td>LOW</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>...</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>LOW</td>\n",
       "      <td>ZERO</td>\n",
       "      <td>ZERO</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>LOW</td>\n",
       "      <td>LOW</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>...</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>LOW</td>\n",
       "      <td>ZERO</td>\n",
       "      <td>LOW</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>...</td>\n",
       "      <td>LOW</td>\n",
       "      <td>LOW</td>\n",
       "      <td>ZERO</td>\n",
       "      <td>ZERO</td>\n",
       "      <td>ZERO</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>...</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>LOW</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>LOW</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>FALSE</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>...</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>NORMAL</td>\n",
       "      <td>ZERO</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>LOW</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>HIGH</td>\n",
       "      <td>LOW</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 37 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  HISTORY     CVP    PCWP HYPOVOLEMIA LVEDVOLUME LVFAILURE STROKEVOLUME  \\\n",
       "0    TRUE     LOW     LOW       FALSE        LOW      TRUE          LOW   \n",
       "1   FALSE  NORMAL  NORMAL       FALSE     NORMAL     FALSE       NORMAL   \n",
       "2   FALSE  NORMAL  NORMAL       FALSE     NORMAL     FALSE       NORMAL   \n",
       "3   FALSE  NORMAL  NORMAL       FALSE     NORMAL     FALSE         HIGH   \n",
       "4   FALSE  NORMAL  NORMAL       FALSE     NORMAL     FALSE       NORMAL   \n",
       "\n",
       "  ERRLOWOUTPUT  HRBP   HREKG  ... MINVOLSET VENTMACH VENTTUBE VENTLUNG  \\\n",
       "0        FALSE  HIGH  NORMAL  ...    NORMAL   NORMAL      LOW     ZERO   \n",
       "1        FALSE  HIGH    HIGH  ...    NORMAL   NORMAL      LOW     ZERO   \n",
       "2        FALSE  HIGH    HIGH  ...       LOW      LOW     ZERO     ZERO   \n",
       "3        FALSE  HIGH    HIGH  ...      HIGH     HIGH     HIGH      LOW   \n",
       "4        FALSE  HIGH    HIGH  ...    NORMAL   NORMAL     ZERO     HIGH   \n",
       "\n",
       "  VENTALV ARTCO2 CATECHOL    HR    CO    BP  \n",
       "0    ZERO   HIGH     HIGH  HIGH   LOW   LOW  \n",
       "1     LOW   HIGH     HIGH  HIGH  HIGH  HIGH  \n",
       "2    ZERO   HIGH     HIGH  HIGH  HIGH  HIGH  \n",
       "3    HIGH    LOW     HIGH  HIGH  HIGH  HIGH  \n",
       "4     LOW   HIGH     HIGH  HIGH  HIGH   LOW  \n",
       "\n",
       "[5 rows x 37 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model = get_example_model(\"alarm\")\n",
    "samples = BayesianModelSampling(model).forward_sample(size=int(1e3))\n",
    "samples.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Funtion to evaluate the learned model structures.\n",
    "def get_f1_score(estimated_model, true_model):\n",
    "    nodes = estimated_model.nodes()\n",
    "    est_adj = nx.to_numpy_array(\n",
    "        estimated_model.to_undirected(), nodelist=nodes, weight=None\n",
    "    )\n",
    "    true_adj = nx.to_numpy_array(\n",
    "        true_model.to_undirected(), nodelist=nodes, weight=None\n",
    "    )\n",
    "\n",
    "    f1 = f1_score(np.ravel(true_adj), np.ravel(est_adj))\n",
    "    print(\"F1-score for the model skeleton: \", f1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learn the model structure using PC"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Working for n conditional variables: 4: 100%|██████████| 4/4 [00:11<00:00,  2.89s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F1-score for the model skeleton:  0.7777777777777779\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "est = PC(data=samples)\n",
    "estimated_model = est.estimate(variant=\"stable\", max_cond_vars=4)\n",
    "get_f1_score(estimated_model, model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Working for n conditional variables: 4: 100%|██████████| 4/4 [00:11<00:00,  2.88s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F1-score for the model skeleton:  0.7777777777777779\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "est = PC(data=samples)\n",
    "estimated_model = est.estimate(variant=\"orig\", max_cond_vars=4)\n",
    "get_f1_score(estimated_model, model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Learn the model structure using Hill-Climb Search"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "  1%|          | 55/10000 [00:16<50:30,  3.28it/s]  "
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "F1-score for the model skeleton:  0.84\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "scoring_method = K2Score(data=samples)\n",
    "est = HillClimbSearch(data=samples)\n",
    "estimated_model = est.estimate(\n",
    "    scoring_method=scoring_method, max_indegree=4, max_iter=int(1e4)\n",
    ")\n",
    "get_f1_score(estimated_model, model)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}